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Abstract 

We consider the quasispecies description of a population evolving 
in both the "master sequence" landscape (where a single sequence 
is evolutionarily preferred over all others) and the REM landscape 
(where the fitness of different sequences is an indipendent, identically 
distributed, random variable). We show that, in both cases, the error 
threshold is analogous to a first order thermodynamical transition, 
where the overlap between the average genotype and the optimal one 
drops discontinuously to zero. 

An equation describing the behavior of populations of self reproducing enti- 
ties, subject to natural selection and to mutations, was introduced by Man- 
fred Eigen in 1971 [Q. The inheritable structure ("genotype") of these entities 
is described by a sequence of length L of symbols belonging to an alphabet 
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of k characters (k = 4 in the case of nucleic acids). In the simple case in 
which one such sequence is selectively preferred with respect to all others, 
Eigen was able to show that his equation (the quasispecies equation) implies 
a transition (called the error threshold) between two different behaviors: 

• At low mutation rate, the population is made up, at equilibrium, of se- 
quences close to the preferred one (master sequence): it forms therefore 
a quasispecies] 

• At higher mutation rate, the distribution becomes uniform over se- 
quence space. 

This behavior is reminiscent of a phase transition in statistical mechanics. 
Indeed, I. Leuthausser showed that the quasispecies equation is equiva- 
lent to a statistical mechanical model. The error threshold corresponds in 
this language to a thermodynamical transition of the statistical mechanical 
system. 

Later, Tarazona || qualified this correspondence, by pointing out that 
the properties which described the behavior of the evolving population cor- 
responded to surface observables of the statistical mechanical model. In 
particular he argued that in the simple situation mentioned above, with a 
single master sequence, while the naive application of statistical mechanics 
predicted a first-order phase transition, the transition was continuous for the 
evolutionary model. The discrepancy between the two predictions was at- 
tributed to a surface phenomenon akin to wetting [ffl, where the disordered 
state is favored on a surface layer whose thickness diverges as the phase 
transition is approached. 

The statistical mechanics approach to the quasispecies equation was later 
used by Franz et al. || to solve it in a "rugged fitness landscape" (in which the 
selective value of each different sequence is an independent random variable) 
modelled by Derrida's Random Energy Model (REM) ||. In this case, a 
first-order transition between the quasispecies and the uniform behavior was 
found. This result has been challenged by P. G. Higgs and G. Woodcock [^]. 

The aim of this letter is to point out that the discrepancy between pre- 
dictions is due to the fact that one's attention is directed towards different 
observables in the different cases: a careful consideration of the "infinite 
genome" (L — > oo) limit, necessary to obtain a sharp phase transition, shows 
that, in the "master sequence" model, the error threshold is a first-order 
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phase transition. This does not rule out the fact that, in the same limit, 
the fraction of individuals whose genotype is equal to the master sequence 
(or, for that matter, is at any finite Hamming distance away from it) goes 
smoothly to zero at the transition. In particular, the "wetting phenomenon" 
described by Tarazona does not take place, at least in this case. Similar 
results hold for the rugged fitness landscape. 

Let us consider the k=2 "master sequence" model, defined as follows. The 
genotype s is described by L units Sj = ±1, i = 1, . . . , L. The quasispecies 
equation, which describe the evolution of the fraction x s (t) of individuals 
having the genotype s at generation t, takes the form 

x s (t + 1) = — j-r Y,Qss'Ws'Xs(t), (1) 

where w s is the fitness of sequence s and ||Q SS '|| is the mutation matrix. The 
normalization factor Z(t) is given by 

Z(t) = J2w s x s (t). 

s 

The matrix element Q ss / is the conditional probability that a reproduction 
event of an individual with genotype s' produces one with genotype s. If one 
assumes pointwise mutations with uniform probability one has 

Q ss ,=^ s > s '\l-fi) L - d ^ s > s '\ (2) 

where 
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L 



<*h(s, s') = - (1 - (3) 



is the Hamming distance between the sequences s and s', and \i is the mu- 
tation rate. The "master sequence" is denoted by s° = (s°). The fitness w s 
is then given by 

Ws= (eMkL), ifa = a °; (4) 
1 1, otherwise. 

In this expression, k > is a "selective" inverse temperature. We have 
chosen to take In w s o cx L in order to obtain the infinite genome limit in 
close analogy with the thermodynamical limit. We shall discuss later the 
scaling considered by Eigen ffl and followers, in which wgo — > const. 
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As pointed out by Leuthausser and Tarazona |2|, [J , the solution of eq. ([I]) 
can be expressed in terms of a statistical mechanical model. Let us consider 
a population evolving for T generations from an initial condition in which 
x« = <L a o. One has 



x s (T) 



- E 

2 S (1), S (2),..., S (T-1) 



1 

z 



E ex P 

S (1), S (2),..., S (T-1) 



Qs(T)s(T-l)W s (T-l) ■ ■ ■ Qs(l)s°W s 
T 

(\n-Qs(t)s(t-i) + \nw s{t -i)) 



i=l 



(5) 



We have set s(T) = s, s(0) = s°, and we have defined the normalization 
constant Z by 



(in Q s ( t )s(t-i) + lnw s(t _i)) 

i=l 



z = Y, ex P 

• (1),»(2),... I »(T-1),»(T) 

E exp(-#{ S (t)}). 
s (i), s (2),..., s (r-i), s (r) 

The last line defines the symbol if. It now turns out that, for the "master 
sequence" model, 



H{s(t)} = ( ln Qs(t)s(t-i) + hi w s (t-i)) 



TLln(l-/i) 



+ E f/9E s i(*) s i(*- + 
t=i V i=i / 



(6) 



where the "mutation" inverse temperature j3 is defined by 

/3=ilni^. 
H 2 /i 



(7) 



The expression (^) looks like the Hamiltonian (times the temperature) of an 
Ising system of TL spins, arranged in T layers of L spins each. The interlayer 
interactions, representing the correlation effects due to the heredity, are pro- 
portional to j3, while the intralayer interactions, representing the selection, 
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are proportional to k. Tarazona [0] pointed out that the intralayer interac- 
tion term corresponding to layer T is lacking in this expression: the system 
corresponds therefore to a statistical mechanical model with a free surface. 

It is now easy to see that, in the limit L — > oo followed by T — > oo, 
a phase transition separates an ordered ("frozen") regime in which one has 
s(t) = s° for all layers t except the last one, from a disordered ("free") one, in 
which all sequences s have the same probability, and the system behaves like 
a collection of L independent one-dimensional Ising models at temperature 
f3~ 1 . The transition line can be obtained by comparing the free energies F 
defined by F = — In Z: 

1. For the ordered regime one has 

Fx = —TL (k + (3) + boundary terms; (8) 

2. For the disordered regime one has 

F 2 = —TL In (2 cosh (3) + boundary terms, (9) 

corresponding to the free energy per spin of a one-dimensional Ising 
model. 

The transition line is given by the condition F\ = F 2 (where the boundary 
terms are neglected) and reads 

k t ((3) = ln(2cosh/3) - f3. (10) 

In terms of the mutation rate fi, this corresponds to k t = | ln(l — as 
originally obtained by Eigen 

We now show in more detail that all layers but the last one (corresponding 
to t = T) are "frozen" for k > k t (/3), in the sense that the only configurations 
which contribute in the infinite genome limit are those for which s(t) = s° 
for t < T. Let us consider the last-but-one layer (t = T — 1), and let us 
momentarily assume that the preceding layer is frozen. The last layer is 
free, because there are no contributions from the intralayer interactions at 
i = T |. There are two possibilities for s(T — 1): 

1. "Frozen": s(T — 1) = s°: this yields a contribution exp [L(k + (3)] x 
(2 cosh f3) L to the partition sum; the second factor comes from the sum 
over the configurations of the last layer; 
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2. "Free": summing also over the configurations of the last-but-one layer, 
one obtains the contribution (2cosh/3) 2L . 

Because one has, by hypothesis, k > k t {(3) = ln(2cosh/3) — /3, the first 
contribution dominates for L — > oo. By induction, one can show in the same 
way that it is not possible that there is a label to < T separating "frozen" 
layers (for t < to) from "free" ones (for t > to). 

Let us now define, following Tarazona ||, the order parameter m as the 
overlap of the average sequence ((si)) with the master sequence s°: 

m=jjr{s i )8 i =l-2{dn{8,J>))/L. (11) 

8=1 

The angular brackets denote the population average: 

(A(s))=J2^A(s), (12) 

s 

where we have taken into account that J2s x x — 1- 

In the ordered phase, all layers but the last one are frozen to the master 
sequence. It is then a simple matter to show that 

m = tanh f3 = 1 - 2/x. (13) 

On the other hand, m = 0, obviously, in the disordered phase. We have thus 
obtained the result that the phase transition is of first order, and that m 
drops discontinuously from 1 — 2/i to as k falls below the transition value 
k t . Let us also remark that eq. ([H|) predicts that m = for (3 — 0, even for 
k > k t (0) = In 2, as it is reasonable to expect on intuitive grounds. 

This analysis is supported by the numerical solution of the quasispecies 
equation for finite L. We show in fig. 1 the order parameter as a function of 
\i for different values of L. The value of k is such that the error threshold 
takes place for /x = /x t = 0.25. One clearly sees that the curve approaches a 
discontinuous behavior as L increases, contrary to the statements contained 
in ref. [[3J. Let us remark that, properly speaking, the weight x s o(T) in the 
population approaches in the thermodynamical limit [L — > oo, k,f3 — 
const.). Nevertheless the population forms a bona fide quasispecies, in the 
sense of ref. 0. 

Eigen Leuthausser 0, and Tarazona have considered a situation 
in which the fitness ratio w s o/w s is kept constant as L increases. In this 
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case, if the mutation rate /i (and hence f3) is kept constant, one eventually 
crosses over smoothly to a "disordered" regime, independently of the value of 
this ratio. This is the point that Eigen wanted to make when he introduced 
the quasispecies equation, back in 1971: that the error threshold prevented 
biological information to be maintained, if genome length exceeded a certain 
value. 

In this situation, there is no sharp phase transition, and the question 
whether it be of first or second order is pointless. However, even in this 
case, one can obtain a phase transition in the limit L — > oo, if one mantains 
constant the average number /iL of mutations. This corresponds to take 
In f3 oc L. It is possible to solve the problem in this limit, and the results 
concide with what one obtains by taking the same limit in the equations we 
haver written above. In particular, the transition is still of first order, but 
now the order parameter m jumps from 1 to 0: just above the transition, 
the whole population lies a finite Hamming distance away from the master 
sequence (even though the weight of the master sequence goes smoothly to 
zero). 

On the other hand, this behavior does contradict the fact that the weight 
x s o(T) of the master sequence (which does not vanish if /x is small enough) 
approaches continuously at the error threshold, as exhibited by fig. 2. The 
limit behavior is indeed given by || 



where u — 1 — exp(— u,L) is the total mutation probability, and u t is the 
corresponding transition value. However, even in the infinite genome limit, 
the whole population is the offspring of master sequence individuals at each 
generation, and has therefore a finite overlap with the master sequence, as 
soon as one is above the transition. 

Let us now consider the REM fitness landscape. In this case the fitness 
w s is given by 



where the "energies" E(s) of different sequences are independent normally 
distributed random variables, with zero average and variance equal to L/2. 




1 — u/u t , for u < u t ; 
0, otherwise; 



(14) 




(15) 
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We have correspondingly 

z — ex p 

s(l),...,s(T) 



T-l 

(16) 

t=i \ i=i 



where, as before, we assume that s(0) corresponds to an energy minimum. 
The bulk properties of this model have been studied in || with the replica 
method. We briefly illustrate here the results, using the argument origi- 
nally developed by Derrida to solve the REM |J. Suppose to consider two 
neighboring layers, t and t + 1, whose overlap q, defined by 

1 L 

L i=i 

has a fixed value. The average number of these configurations with energy 
equal to E is given by 

Af(E,q)~ex P (LS(q)-E 2 /L), (17) 

where S(q) = In 2— \ [(1 + q) ln(l + q) + (1 — g) ln(l — q)] . The typical value 
is equal to the average value if the latter is exponentially large, and vanishes 
otherwise. We can thus write for a typical sample: 

Z=( dE dqN(E, q) exp [T(—kE + Lf3q)} . (18) 

J(LS(q)-E2/L)>0 

This expression is dominated by the saddle point in the free phase, and by 
the smallest value of the energy (and q = 1) in the frozen phase. The typical 
value can be obtained by eq. (|T7|) , by setting J\f(E,0) ~ 0(1), and is equal 
to — LVhi 2. The free energy is thus given 

p _ \ —TL [ln(2 cosh (3) + k 2 /A] , in the free phase; 
| —TL (fc\/ln 2 + P), in the frozen phase. 

By comparison of the free energies, the transition line is located at || 

kt(j3) = 2 (V\^2 - ^-ln(cosh^)) . (20) 

We remark en passant that, contrary to the REM and other systems with 
discontinuous glass transitions 0, here the transition is thermodynamically 
of first order, with a latent heat that can be easily computed from eq. flT9|). 
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The surface (evolutionary) properties can be worked out as in the "master 
sequence" case. Let us consider the frozen phase, and let us assume that layer 
T — 2 is frozen into one of the REM ground states. We focus on layer T — 1 
and assume that layer T is free. Layer T has then no influence on layer T — 1, 
whose contribution to the free energy is given by 

Z T _ 1= J2 ew(-kE(3(T-l))+p y Es i (T-2)s i (T-l)\. (21) 

s(T-l) V i I 



Analyzing eq. (21) as above, we find a freezing transition into the ground 
state, with the same conditions as for the bulk. Therefore, as long as k > k t , 
all layers but the last one are frozen in the energy ground state. It is clear 
at this point that all along the frozen phase m = (l/L) J2i (si(O)sj(t)) is 
independent of t for t > 0, and is given by m = tanh (3. In the same way the 
weight of the optimal sequence behaves as in the "master sequence" model. 

Summarizing, we have shown that in the "master sequence" and in the 
REM landscapes for the quasispecies equation, in the limit in which one 
can speak of sharp phase transitions, surface phenomena do not appear. 
Indeed the surface passively follows the behavior of the bulk. This is due to 
the pathology of the model, that is one-dimensional in the time direction, 
but mean-field like in the sequence direction. The analysis also shows that 
"master sequence" and REM landscapes, apart from details, have very similar 
evolutive properties. 
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Figure captions 

1. Order parameter m in the "master sequence" model as a function of 
the mutation rate \i for L = 10, 20, 40, 80. The selective temperature k 
equals In (4/3). Also plotted the prediction of eq. (|I~3|). 

2. Weight x s o = xq of the master sequence as a function of the total 
mutation rate u — 1 — exp(— fiL) for L = 10,20,40,80. We have 
chosen kL = 1/4 so that fi t L = 1/4. Also plotted the prediction [Q 
x s o — 1 — u/ut, where u t is the value of u at the transition. 
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